dblogr.com/

Calculating Heritability

An R tutorial on how to calculate heritability


Introduction

Heritability: an estimate of the degree of phenotypic variation due to genetic variation.

Phenotypic variance, the variation for a particular trait, can be attributed to both genetic and environmental factors. This leads to the concept of Heritability, the degree of variation in a phenotypic trait in a population that is due to genetic variation between individuals in that population. There are two types of heritability used by researchers: Broad Sense Heritability (\(H^2\)) and Narrow Sense Heritability (\(h^2\)).

Broad Sense Heritability:

\(H^2=\frac{V_G}{V_P}=\frac{V_G}{V_G+V_E}\)

where:

  • \(V_G\) = Genetic variance
  • \(V_E\) = Environmental variance
  • \(V_P\) = Phenotypic variance

\(V_P=V_G+V_E\)

Narrow Sense Heritability:

\(h^2=\frac{V_A}{V_P}\)

where:

  • \(V_A\) = Additive variance
  • \(V_P\) = Phenotypic variance

Breeders Equation:

\(R=h^2S\)

which can be rearranged to calculate narrow sense heritability:

\(h^2=R/S\)

where:

  • \(S\) = Selection differential
  • \(R\) = Response to selection (genetic gain)

\(S=Parent_{selected}-Parent_{mean}\)

\(R=Progeny_{mean}-Parent_{mean}\)

source: Wikipedia


Prepare Data

# devtools::install_github("derekmichaelwright/agData")
library(agData)
library(ggridges)
library(lme4)

Johannesen 1903

Extra links

# Prep data
myCaption1 <- paste("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: Johannsen (1903)")
d1 <- read.csv("data_johannsen1903_1.csv") 
d2 <- read.csv("data_johannsen1903_2.csv")

Selection requires genetic variance

In a pure-line, without genetic variance, there will not be any room for selection.

Johannsen showed this by selecting for different sized seeds from a pure bred line and showed that this did not have any selection potential.

DT::datatable(d1)

# Prep data
myTitle <- "Populations derived from a pure bred Phaseolus vulgaris line"
xx <- d1 %>% gather(`Materal Seed Size`, Count, 2:ncol(.)) %>%
  mutate(`Materal Seed Size` = substr(`Materal Seed Size`, 2, 8),
         `Materal Seed Size` = gsub("\\.", "-", `Materal Seed Size`),
         pMin = as.numeric(substr(`Materal Seed Size`, 1, 3)),
         pMax = as.numeric(substr(`Materal Seed Size`, 5, 7)))
xm <- xx %>% filter(!is.na(Count)) %>% uncount(Count) %>% group_by(`Materal Seed Size`) %>% 
  summarise(Seed.Mass = mean(Seed.Mass))
# Plot
mp <- ggplot(xx, aes(x = Seed.Mass, y = Count)) +
  geom_rect(aes(fill = `Materal Seed Size`, xmin = pMin, xmax = pMax), 
            ymin = Inf, ymax = -Inf, alpha = 0.7, color = "black") +
  geom_col(alpha = 0.7, color = "black") +
  geom_vline(data = xm, aes(xintercept = Seed.Mass)) +
  facet_grid(`Materal Seed Size` ~ ., scales = "free_y") +
  scale_x_continuous(breaks = seq(100, 900, by = 100)) +
  guides(fill = guide_legend(nrow = 1)) +
  theme_agData(legend.position = "bottom") +
  labs(title = myTitle, y = NULL, x = "Progeny Seed Size (mg)", caption = myCaption1)
ggsave("heritability_1_01.png", mp, width = 8, height = 8)

# Prep data
xx <- xm %>% rename(`Progeny Mean`=Seed.Mass) %>%
  mutate(`Parent Mean` = seq(200, 700, by = 100))
myfit <- lm(`Progeny Mean` ~ `Parent Mean`, data = xx)
# Plot
mp <- ggplot(xx, aes(y = `Progeny Mean`, x = `Parent Mean`)) +
  geom_smooth(method = "lm", se = F, color = "black", alpha = 0.7) +
  geom_point(size = 2.5, alpha = 0.7) +
  facet_wrap(paste("h^2 =", round(myfit$coefficients[2], 2)) ~ .) +
  scale_x_continuous(breaks = seq(200,600, by = 200), limits = c(0,900)) + 
  scale_y_continuous(breaks = seq(200,600, by = 200), limits = c(150,750)) +
  theme_agData() +
  labs(title = "Seed Size", subtitle = myTitle, caption = myCaption1)
ggsave("heritability_1_02.png", mp, width = 6, height = 4)  

Selection is possible with genetic variance

If instead we look to different populations for seed size variation, then we can have selection potential and measure heritability.

DT::datatable(d2)

# Prep data
myTitle <- "Selection attempts in a pure bred Phaseolus Vulgaris line"
xx <- d2 %>% gather(`Maternal Mean`, Count, 2:ncol(.)) %>%
  mutate(Line = substr(`Maternal Mean`, 1, 1),
         `Maternal Mean` = as.numeric(substr(`Maternal Mean`, 2, 4))) %>%
  arrange(`Maternal Mean`) %>%
  mutate(Line = factor(Line, levels = unique(Line)))
xm <- xx %>% filter(!is.na(Count)) %>% uncount(Count) %>% 
  group_by(Line, `Maternal Mean`) %>% 
  summarise(`Progeny Mean` = mean(Seed.Mass)) %>% 
  gather(Trait, Seed.Mass, 2:3)
# Plot
mp <- ggplot(xx, aes(x = Seed.Mass, y = Count)) +
  geom_col(alpha = 0.7) +
  geom_density(stat = "identity") +
  geom_vline(data = xm, aes(xintercept = Seed.Mass, color = Trait)) +
  facet_grid(Line ~ ., scales = "free_y") +
  scale_x_continuous(breaks = seq(100, 900, by = 100)) +
  scale_color_manual(name = NULL, values = c("darkgreen","black")) +
  theme_agData(legend.position = "bottom") +
  labs(title = , y = NULL, x = "mg", caption = myCaption1)
ggsave("heritability_2_01.png", mp, width = 8, height = 12)

# Prep data
xx <- xm %>% spread(Trait, Seed.Mass) 
myfit <- lm(`Progeny Mean` ~ `Maternal Mean`, data = xx)
# Plot
mp <- ggplot(xx, aes(y = `Progeny Mean`, x = `Maternal Mean`)) +
  geom_smooth(method = "lm", se = F, color = "black", alpha = 0.7) +
  geom_point(size = 2.5, alpha = 0.7) +
  facet_wrap(paste("h^2 =", round(myfit$coefficients[2], 2)) ~ .) +
  scale_x_continuous(breaks = seq(200,600, by = 200), limits = c(0,900)) + 
  scale_y_continuous(breaks = seq(200,600, by = 200), limits = c(150,750)) +
  theme_agData() +
  labs(title = "Seed Size", subtitle = myTitle, caption = myCaption1)
ggsave("heritability_2_02.png", mp, width = 6, height = 4)

East 1916


All Data

# Prep data
myCaption3 <- paste("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: East (1916)")
d3 <- read.csv("data_east1916.csv")
DT::datatable(d3)
d3.1 <- d3 %>%
  mutate(Name = factor(Name, levels = rev(unique(.$Name)))) %>%
  gather(Size, Num, 7:ncol(.)) %>%
  mutate(Size = as.numeric(gsub("mm", "", Size))) %>%
  filter(!is.na(Num)) %>% arrange(Name, Year)

# Plot
mp <- ggplot(d3.1, aes(x = Size, y = Name)) +
  geom_tile(aes(fill = Num), color = "black", alpha = 0.7, width = 3) +
  geom_text(aes(label = Num)) +
  geom_tile(data = d3, aes(x = ParentSize), color = "black", fill = NA, width = 1) +
  facet_grid(Year ~ ., scale = "free_y", space = "free_y") +
  scale_fill_gradient(low = "grey", high = "darkred") +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption3)
ggsave("heritability_3_01.png", mp, width = 8, height = 6)

DT::datatable(d3)

Large Seed Selection

# Prep data
xx <- d3.1 %>% filter(Selection != "Small" | (Generation %in% c("P1","P2") & Year == 1911))
# Plot
mp <- ggplot(xx, aes(x = Size, y = Name, fill = Num, label = Num)) +
  geom_tile(color = "black", alpha = 0.7, width = 3) +
  geom_text() +
  facet_grid(Year ~ ., scale = "free_y", space = "free_y") +
  scale_fill_gradient(low = "grey", high = "darkred") +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption3)
ggsave("heritability_3_02.png", mp, width = 8, height = 6)

# Prep data
myLines <- c("No. 383", "No. 330", "No. 383 & No. 330", "No. 383 x 330", "No. (383 x 330) 1" ,
             "No. (383 x 330) 1-3",  "No. (383 x 330) 1-3-1", "No. (383 x 330) 1-3-1-1")
xx <- d3.1 %>% filter(Selection != "Small", Name %in% myLines) %>%
  group_by(Name, Size) %>% uncount(Num)
x1 <- xx %>% group_by(Name) %>% summarise(Mean = mean(Size))
xx <- xx %>% left_join(x1, by = "Name")
# Plot
mp <- ggplot(xx, aes(x = Size, y = Name, fill = Mean)) +
  geom_density_ridges(alpha = 0.5, scale = 1.5, bandwidth = 1.5) +
  scale_fill_gradientn(colours = c("darkred", "darkgoldenrod2", "steelblue")) +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption3)
ggsave("heritability_3_03.png", mp, width = 10, height = 6)

Small Seed Selection

# Prep data
xx <- d3.1 %>% filter(Selection != "Large" | (Generation %in% c("P1","P2") & Year == 1911))
# Plot
mp <- ggplot(xx, aes(x = Size, y = Name, fill = Num, label = Num)) +
  geom_tile(color = "black", alpha = 0.7, width = 3) +
  geom_text() +
  facet_grid(Year ~ ., scale = "free_y", space = "free_y") +
  scale_fill_gradient(low = "grey", high = "darkred") +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption3)
ggsave("heritability_3_04.png", mp, width = 8, height = 6)

# Prep data
myLines <- c("No. 383", "No. 330", "No. 383 & No. 330", "No. 383 x 330", "No. (383 x 330) 1" ,
             "No. (383 x 330) 1-3",  "No. (383 x 330) 1-3-1", "No. (383 x 330) 1-3-1-1")
xx <- d3.1 %>% filter(Selection != "Large", Name %in% myLines) %>%
  group_by(Name, Size) %>% uncount(Num)
x1 <- xx %>% group_by(Name) %>% summarise(Mean = mean(Size))
xx <- xx %>% left_join(x1, by = "Name")
# Plot
mp <- ggplot(xx, aes(x = Size, y = Name, fill = Mean)) +
  geom_density_ridges(alpha = 0.5, scale = 1.5, bandwidth = 1.5) +
  scale_fill_gradientn(colours = c("darkred", "darkgoldenrod2", "steelblue")) +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption3)
ggsave("heritability_3_05.png", mp, width = 10, height = 6)

Calculate h^2 With Breeders Equation

# Prep data
xx <- d3.1 %>% 
  group_by(Name, Selection, Parents, ParentSize) %>%
  summarise(Mean = sum(Num * Size) / sum(Num),
            Mean = round(Mean, 1)) %>%
  ungroup() %>% arrange(desc(Name))
DT::datatable(xx)
xx <- xx %>%
  left_join(select(., Parents=Name, ParentMean=Mean), by = "Parents") %>%
  mutate(R = abs(Mean - ParentMean),
         S = abs(ParentSize - ParentMean),
         h2 = R / S,
         h2 = round(h2, 2))
DT::datatable(xx)

Calcaulte h^2 with Linear Regression

By rearranging the breeders formula, narrow sense heritability can be determined by calculating the slope of a regression line of progeny mean as a function of parental mean.

\(h^2=R/S\)

# Prep data
myfit <- lm(Mean ~ ParentSize, xx)
# Plot
mp <- ggplot(xx %>% filter(Selection != ""), 
             aes(x = ParentSize, y = Mean, color = Selection)) +
  geom_smooth(method = "lm", se = F, color = "black", alpha = 0.7) +
  geom_point(size = 2.5, alpha = 0.7) +
  facet_wrap(paste("h^2 =", round(myfit$coefficients[2], 2)) ~ .) +
  scale_color_manual(values = c("darkgreen","steelblue")) +
  ylim(c(min(xx$Mean), max(xx$Mean))) + 
  xlim(c(min(xx$Mean), max(xx$Mean))) +
  theme_agData(legend.position = "bottom") +
  labs(x = "Parental Mean", y = "Progeny Mean", caption = myCaption3)
ggsave("heritability_3_06.png", mp, width = 6, height = 4)

# Prep data
myfit_L <- lm(Mean ~ ParentSize, xx %>% filter(Selection == "Large"))
myfit_S <- lm(Mean ~ ParentSize, xx %>% filter(Selection == "Small"))
xx <- xx %>%
  mutate(H2 = ifelse(Selection == "Large",
            paste("Large h^2 =", round(myfit_L$coefficients[2], 2)),
            paste("Small h^2 =", round(myfit_S$coefficients[2], 2))) )
# Plot
mp <- ggplot(xx %>% filter(Selection != ""), 
             aes(x = ParentSize, y = Mean, color = Selection)) +
  geom_smooth(method = "lm", se = F, color = "black", alpha = 0.7) +
  geom_point(size = 2, alpha = 0.7) +
  facet_grid(. ~ Selection + H2) +
  scale_color_manual(values = c("darkgreen","steelblue")) +
  ylim(c(min(xx$Mean), max(xx$Mean))) + 
  xlim(c(min(xx$Mean), max(xx$Mean))) +
  theme_agData(legend.position = "none") +
  labs(x = "Parental Mean", y = "Progeny Mean", caption = myCaption3)
ggsave("heritability_3_07.png", mp, width = 6, height = 4)

Tomato Example

myCaption4 <- expression(paste("www.dblogr.com/ or derekmichaelwright.github.io/dblogr/ | Data: Collins ", 
                               italic("et al"), ". (1930)"))
d4 <- read.csv("data_tbrtquality.csv") %>% 
  mutate(Rep = factor(Rep),
         Expt = paste(Loc, Year),
         Brix = as.numeric(Brix))

Brix: a measure of the dissolved solids in a liquid, and is commonly used to measure dissolved sugar content of an aqueous solution. One degree Brix is 1 gram of sucrose in 100 grams of solution and represents the strength of the solution as percentage by mass.

# Plot
mp <- ggplot(d4, aes(x = Rep, y = Brix, color = Rep)) +
  geom_quasirandom(alpha = 0.7) +
  facet_grid(. ~ paste(Year, Loc)) +
  scale_color_manual(values = c("darkgreen","steelblue")) +
  theme_agData(legend.position = "none") +
  labs(caption = myCaption4)
ggsave("heritability_4_01.png", mp, width = 6, height = 4)

# Prep data
myColors <- c("darkgreen", "steelblue", "darkred", "darkblue")
xx <- d4 %>% group_by(Line, Expt) %>% 
  summarise(Mean = mean(Brix, na.rm = T),
            Max  =  max(Brix, na.rm = T),
            Min  =  min(Brix, na.rm = T)) %>% 
  ungroup() %>% arrange(Mean) %>%
  mutate(Line = factor(Line, levels = unique(.$Line))) %>%
  filter(!is.na(Mean))
# Plot
mp <- ggplot(xx, aes(x = Line, color = Expt)) +
  geom_point(aes(y = Mean)) +
  geom_segment(aes(xend = Line, y = Min, yend = Max)) +
  scale_color_manual(values = myColors) +
  facet_grid(Expt ~ .) +
  theme_agData(legend.position = "bottom",
               axis.text.x = element_text(angle = 45, hjust = 1, size = 4)) +
  labs(y = "Brix", caption = myCaption4)
ggsave("heritability_4_02.png", mp, width = 10, height = 6)

# Prep data
x1 <- xx %>% filter(Expt == "CA 2010") %>%
  top_n(-Mean, n = 10)
x2 <- xx %>% filter(Expt == "CA 2010") %>%
  top_n(Mean, n = 10)
yy <- d4 %>% filter(Line %in% c(x1$Line, x2$Line)) %>%
  mutate(Line = factor(Line, levels = c(x1$Line, x2$Line)),
         Group = ifelse(Line %in% x1$Line, "Low", "High"),
         Group = factor(Group, levels = c("Low", "High")))
# Plot
mp <- ggplot(yy, aes(x = Group, y = Brix, color = Group)) +
  geom_quasirandom(alpha = 0.8) +
  facet_grid(. ~ Expt) +
  scale_color_manual(values = c("darkred", "steelblue")) +
  theme_agData(legend.position = "none",
               axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = NULL, caption = myCaption4)
ggsave("heritability_4_03.png", mp, width = 6, height = 4)

Lets run an anova to determine environmental variance.

# ANOVA
myfit <- lmer(Brix ~ (1 | Line) + (1 | Loc) + (1 | Year) + (1 | Line:Loc) + (1 | Line:Year), data = d4)
myfit_aov <- summary(myfit)
myfit_aov
## Linear mixed model fit by REML ['lmerMod']
## Formula: Brix ~ (1 | Line) + (1 | Loc) + (1 | Year) + (1 | Line:Loc) +      (1 | Line:Year)
##    Data: d4
## 
## REML criterion at convergence: 1742.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0258 -0.6132 -0.0533  0.5293  5.0653 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Line:Year (Intercept) 0.01021  0.10105 
##  Line:Loc  (Intercept) 0.00948  0.09736 
##  Line      (Intercept) 0.19193  0.43810 
##  Year      (Intercept) 0.03164  0.17787 
##  Loc       (Intercept) 0.18848  0.43414 
##  Residual              0.25782  0.50776 
## Number of obs: 967, groups:  Line:Year, 284; Line:Loc, 281; Line, 143; Year, 2; Loc, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   5.2817     0.3343    15.8

\(Heritability = \frac{var(Line)}{ var(Line) + \frac{var(Line:Loc)}{2} + \frac{var(Line:Year)}{2} + \frac{var(Residual)}{4}}\)

\(H^2 = \frac{0.1919}{0.1919 + \frac{0.0095}{2} + \frac{0.0102}{2} + \frac{0.2578}{4}} = 0.72\)

Vg <- myfit_aov$varcor$Line[1]
Ve <- myfit_aov$varcor$`Line:Loc`[1] / 2 + 
      myfit_aov$varcor$`Line:Year`[1] / 2 + 
      myfit_aov$sigma^2 / 4
Vg / (Vg + Ve)
## [1] 0.7209216

BLUPs

myBLUPs <- ranef(myfit)$Line
mp <- ggplot(myBLUPs, aes(x = `(Intercept)`)) +
  geom_histogram(color = "black", fill = "darkred", alpha = 0.7) +
  labs(caption = myCaption4)
ggsave("heritability_4_04.png", mp, width = 6, height = 4)

© Derek Michael Wright